在本文中,我们提出了一个多目标相机ISP框架,该框架利用深入增强学习(DRL)和由基于网络和常规ISP工具组成的相机ISP工具箱。提出的基于DRL的相机ISP框架迭代从工具箱中选择一个合适的工具,并将其应用于图像以最大化给定的特定于任务特定的奖励功能。为此,我们总共实施了51个ISP工具,包括曝光校正,颜色和色调校正,白平衡,锐化,脱氧和其他工具。我们还提出了一个有效的DRL网络体系结构,该体系结构可以提取图像的各个方面,并在图像和大量操作之间建立刚性映射的关系。我们提出的基于DRL的ISP框架可有效地根据每个视觉任务(例如原始to-RGB图像恢复,2D对象检测和单眼深度估计)提高图像质量。
translated by 谷歌翻译
最近,从热图像的深度和自我运动的自我监督学习在具有挑战性的情景下,热量图像的深度和自我运动的学习表现出强大的鲁棒性和可靠性。然而,诸如弱对比度,模糊边缘和噪声阻碍的固有的热图像属性,以产生从热图像产生有效的自我监督。因此,大多数研究依赖于额外的自我监督源,例如LOT-LIT RGB图像,生成模型和LIDAR信息。在本文中,我们对热图像特性进行了深入的分析,从而从热图像退化自我监督。基于分析,我们提出了一种有效的热图像映射方法,其显着增加了图像信息,例如整体结构,对比度和细节,同时保持时间一致性。所提出的方法显示出比以前的最先进的网络的表现优于优势和姿势,而不利用额外的RGB引导。
translated by 谷歌翻译
学习估计对象姿势通常需要地面真理(GT)标签,例如CAD模型和绝对级对象姿势,这在现实世界中获得昂贵且费力。为了解决这个问题,我们为类别级对象姿势估计提出了一个无监督的域适应(UDA),称为\ textbf {uda-cope}。受到最近的多模态UDA技术的启发,所提出的方法利用教师学生自我监督的学习方案来训练姿势估计网络而不使用目标域标签。我们还在预测归一化对象坐标空间(NOCS)地图和观察点云之间引入了双向滤波方法,不仅使我们的教师网络更加强大地对目标域,而且为学生网络培训提供更可靠的伪标签。广泛的实验结果表明了我们所提出的方法的有效性,可以定量和定性。值得注意的是,在不利用目标域GT标签的情况下,我们所提出的方法可以实现与依赖于GT标签的现有方法相当或有时优越的性能。
translated by 谷歌翻译
热摄像机可以在恶劣的光线条件下(例如夜面场景,隧道和灾难场景)强烈捕获热辐射图像。但是,尽管有这一优势,但迄今为止,尚未对热摄像机的深度和自我运动估计研究进行积极探索。在本文中,我们提出了一种从热图像中进行深度和自我运动估计的自制学习方法。所提出的方法利用了由温度和光度一致性损失组成的多光谱一致性。温度一致性损失通过重建剪辑和有色人种的热图像来提供基本的自我预见信号。此外,我们设计了一个可区分的前向翘曲模块,该模块可以将估计深度图的坐标系和从热摄像机转换为可见摄像头的坐标系。基于提出的模块,光度一致性损失可以为网络提供互补的自学。经过提议的方法训练的网络可鲁棒地估计在弱光甚至零灯条件下单眼热视频的深度和姿势。据我们所知,这是第一项以自我监督的方式同时估算单眼热视频的深度和自我感动的作品。
translated by 谷歌翻译
Understanding the informative structures of scenes is essential for low-level vision tasks. Unfortunately, it is difficult to obtain a concrete visual definition of the informative structures because influences of visual features are task-specific. In this paper, we propose a single general neural network architecture for extracting task-specific structure guidance for scenes. To do this, we first analyze traditional spectral clustering methods, which computes a set of eigenvectors to model a segmented graph forming small compact structures on image domains. We then unfold the traditional graph-partitioning problem into a learnable network, named \textit{Scene Structure Guidance Network (SSGNet)}, to represent the task-specific informative structures. The SSGNet yields a set of coefficients of eigenvectors that produces explicit feature representations of image structures. In addition, our SSGNet is light-weight ($\sim$ 55K parameters), and can be used as a plug-and-play module for off-the-shelf architectures. We optimize the SSGNet without any supervision by proposing two novel training losses that enforce task-specific scene structure generation during training. Our main contribution is to show that such a simple network can achieve state-of-the-art results for several low-level vision applications including joint upsampling and image denoising. We also demonstrate that our SSGNet generalizes well on unseen datasets, compared to existing methods which use structural embedding frameworks. Our source codes are available at https://github.com/jsshin98/SSGNet.
translated by 谷歌翻译
In many domains such as transportation and logistics, search and rescue, or cooperative surveillance, tasks are pending to be allocated with the consideration of possible execution uncertainties. Existing task coordination algorithms either ignore the stochastic process or suffer from the computational intensity. Taking advantage of the weakly coupled feature of the problem and the opportunity for coordination in advance, we propose a decentralized auction-based coordination strategy using a newly formulated score function which is generated by forming the problem into task-constrained Markov decision processes (MDPs). The proposed method guarantees convergence and at least 50% optimality in the premise of a submodular reward function. Furthermore, for the implementation on large-scale applications, an approximate variant of the proposed method, namely Deep Auction, is also suggested with the use of neural networks, which is evasive of the troublesome for constructing MDPs. Inspired by the well-known actor-critic architecture, two Transformers are used to map observations to action probabilities and cumulative rewards respectively. Finally, we demonstrate the performance of the two proposed approaches in the context of drone deliveries, where the stochastic planning for the drone league is cast into a stochastic price-collecting Vehicle Routing Problem (VRP) with time windows. Simulation results are compared with state-of-the-art methods in terms of solution quality, planning efficiency and scalability.
translated by 谷歌翻译
In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.
translated by 谷歌翻译
Task-oriented dialogue systems often assist users with personal or confidential matters. For this reason, the developers of such a system are generally prohibited from observing actual usage. So how can they know where the system is failing and needs more training data or new functionality? In this work, we study ways in which realistic user utterances can be generated synthetically, to help increase the linguistic and functional coverage of the system, without compromising the privacy of actual users. To this end, we propose a two-stage Differentially Private (DP) generation method which first generates latent semantic parses, and then generates utterances based on the parses. Our proposed approach improves MAUVE by 3.8$\times$ and parse tree node-type overlap by 1.4$\times$ relative to current approaches for private synthetic data generation, improving both on fluency and semantic coverage. We further validate our approach on a realistic domain adaptation task of adding new functionality from private user data to a semantic parser, and show gains of 1.3$\times$ on its accuracy with the new feature.
translated by 谷歌翻译
Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-world scenarios and that the current TOD models are still a long way from unraveling the scenarios. In this position paper, we first identify current status and limitations of SF-TOD systems. After that, we explore the WebTOD framework, the alternative direction for building a scalable TOD system when a web/mobile interface is available. In WebTOD, the dialogue system learns how to understand the web/mobile interface that the human agent interacts with, powered by a large-scale language model.
translated by 谷歌翻译
Any classifier can be "smoothed out" under Gaussian noise to build a new classifier that is provably robust to $\ell_2$-adversarial perturbations, viz., by averaging its predictions over the noise via randomized smoothing. Under the smoothed classifiers, the fundamental trade-off between accuracy and (adversarial) robustness has been well evidenced in the literature: i.e., increasing the robustness of a classifier for an input can be at the expense of decreased accuracy for some other inputs. In this paper, we propose a simple training method leveraging this trade-off to obtain robust smoothed classifiers, in particular, through a sample-wise control of robustness over the training samples. We make this control feasible by using "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input. Specifically, we differentiate the training objective depending on this proxy to filter out samples that are unlikely to benefit from the worst-case (adversarial) objective. Our experiments show that the proposed method, despite its simplicity, consistently exhibits improved certified robustness upon state-of-the-art training methods. Somewhat surprisingly, we find these improvements persist even for other notions of robustness, e.g., to various types of common corruptions.
translated by 谷歌翻译